Picture for Hiroshi Saruwatari

Hiroshi Saruwatari

Fast Multichannel NMF with Block-Diagonal Spatial Covariance Matrices for Efficient Blind Source Separation Using Distributed Microphone Arrays

Add code
May 19, 2026
Viaarxiv icon

Kinetic-Optimal Scheduling with Moment Correction for Metric-Induced Discrete Flow Matching in Zero-Shot Text-to-Speech

Add code
May 10, 2026
Viaarxiv icon

DialogueSidon: Recovering Full-Duplex Dialogue Tracks from In-the-Wild Dialogue Audio

Add code
Apr 13, 2026
Viaarxiv icon

Geneses: Unified Generative Speech Enhancement and Separation

Add code
Jan 26, 2026
Viaarxiv icon

Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement

Add code
Oct 02, 2025
Figure 1 for Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement
Figure 2 for Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement
Figure 3 for Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement
Figure 4 for Emotional Text-To-Speech Based on Mutual-Information-Guided Emotion-Timbre Disentanglement
Viaarxiv icon

Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis

Add code
May 18, 2025
Viaarxiv icon

Causal Speech Enhancement with Predicting Semantics based on Quantized Self-supervised Learning Features

Add code
Dec 26, 2024
Viaarxiv icon

DNN-based ensemble singing voice synthesis with interactions between singers

Add code
Sep 16, 2024
Figure 1 for DNN-based ensemble singing voice synthesis with interactions between singers
Figure 2 for DNN-based ensemble singing voice synthesis with interactions between singers
Figure 3 for DNN-based ensemble singing voice synthesis with interactions between singers
Figure 4 for DNN-based ensemble singing voice synthesis with interactions between singers
Viaarxiv icon

The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech

Add code
Sep 14, 2024
Figure 1 for The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech
Figure 2 for The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech
Figure 3 for The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech
Figure 4 for The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech
Viaarxiv icon

Cross-Dialect Text-To-Speech in Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level BERT

Add code
Sep 11, 2024
Viaarxiv icon